03:00
Talk to your neighbour:
03:00
\[\begin{align} \text{Pr}(n \text{ distinct birthdays}) &= \frac{365}{365} \times \frac{364}{365} \times \dots \times \frac{365 - (n -1)}{365} \\ & \\ &= \prod_{i=0}^{n-1} \left\{\frac{365-i}{365}\right\}. \end{align}\]
\[\widehat p = \frac{\text{relative count}}{\text{total relative count}}\]
\[\widehat{p}_{01/01} = 0.00237 \approx \frac{1}{421}, \quad \widehat{p}_{25/12} = 0.00214 \approx \frac{1}{467}, \quad \widehat{p}_{26/12} = 0.00204 \approx \frac{1}{490}.\]
\[ H_0: \text{all non-holidays have the same mean birth count.}\]
\[vs.\]
\[H_1: \text{At least one day has a different mean.}\]
# 1: set up data & storage
boot_data <- as.vector(as.matrix(non_hols[,3:24]))
boot_sample <- rep(NA,22)
boot_size <- 1e5
boot_means <- rep(NA,boot_size)
# 2: make fake data sets where day does not matter
for (i in 1:boot_size) {
boot_sample <- sample(boot_data, size = 22, replace = TRUE)
boot_means[i] <- mean(boot_sample,na.rm = TRUE)
}
# 3: find "typical" range if day does not influence count
boot_CI <- quantile(boot_means, probs = c(0.025,0.975))
boot_CI 2.5% 97.5%
1737.091 1909.900
We could take this analysis much further:
If you want to learn more: https://pudding.cool/2018/04/birthday-paradox/
::: ::::
Applicant Visit Day Talk - Zak Varty